Analysis of Netflix Data (Independent Study)

نویسندگان

  • Abhishek Gupta
  • Abhijeet Mohapatra
  • Jeffrey Ullman
چکیده

Today Recommendation systems [6] have become indispensable because of the sheer overload of information made available to a user from web-services(Netflix, IMDB, Amazon, Yelp and many others). Recommendation systems are a well studied research area. In the following work, we present our study on the Netflix Challenge [3]. The Netflix Challenge can be summarized in the following way : Given a movie m, predict the rating of a particular user u, given a list of user-movie ratings which may not contain the u;m pair. The performance of all such approaches is measured using the RMSE (root mean-squared error) of the submitted ratings from the actual ratings. Currently, the best system has an RMSE of 0.8585 [4]. In our attempt, we tried a a variety of enhancements to the approach that we followed in CS345a class project. Our previous approach initially seemed like a very promising apporach since we obtained an RMSE of 0.8312 for close to 70% of the validation set data. Unfortunately, none of our variants lead to an RMSE value below the required value of 0.85626. Section 2 gives some background on our approach last quarter by summmarizing our results for last quarter. Section 3 contains a detailed description of our current approach with Section 3.2 focussing on the two variants of collaborative filtering that we experimented with. Finally, Section 4 contains design choices that motivated us to choose parameters, error analysis and our final results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Analysis and Application of Ensemble Method on the Netflix Challenge

1. Introduction The Netflix Prize project is proposed by the Neflix Inc., in order to seek accurate predictions on movie ratings. As one group in the Stanford Netflix Prize team, our responsibility is to explore useful statistics and data curation in the training data set, and to explore ensemble methods for improving prediction accuracies. We imported the Netflix data into a MySQL database for...

متن کامل

Bennett Netflix 100 Winchester Circle

INTRODUCTION The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In 2007, the traditional KDD Cup competition was augmented with a workshop with a focus on the concurrently active Netflix Prize competition [2]. The KDD Cup itself in 2007 con...

متن کامل

CS229: The Netflix Project

This paper investigates the combination and application of a number of machine learning applications to the Netflix Challenge. The algorithm uses extra data in addition to the Netflix training set. Namely, it uses a mapping from Netflix to features gleaned from IMDB, such as the director and genre. Using k-means clustering, the algorithm first clusters the users together by the IMDB features ea...

متن کامل

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

Recommender systems are widely used in E-Commerce for making automatic suggestions of new items that could meet the interest of a given user. Collaborative Filtering approaches compute recommendations by assuming that users, who have shown similar behavior in the past, will share a common behavior in the future. According to this assumption, the most effective collaborative filtering techniques...

متن کامل

Empirical Evaluation of Voting Rules with Strictly Ordered Preference Data

The study of voting systems often takes place in the theoretical domain due to a lack of large samples of sincere, strictly ordered voting data. We derive several million elections (more than all the existing studies combined) from a publicly available data, the Netflix Prize dataset. The Netflix data is derived from millions of Netflix users, who have an incentive to report sincere preferences...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009